Statistical Analysis and Application of Ensemble Method on the Netflix Challenge

نویسندگان

  • Jack Cheng
  • Virginia Chu
  • Yang Wang
چکیده

1. Introduction The Netflix Prize project is proposed by the Neflix Inc., in order to seek accurate predictions on movie ratings. As one group in the Stanford Netflix Prize team, our responsibility is to explore useful statistics and data curation in the training data set, and to explore ensemble methods for improving prediction accuracies. We imported the Netflix data into a MySQL database for data aggregation, and then the aggregated results can be analyzed using Matlab or C++ scripts. So far, we have finished multiple clustering analyses to the movies and the customers by the K-means clustering techniques learnt from class [1]. We clustered the movies by multiple interesting criteria, such as the number of ratings to a movie, the average ratings to a movie, time progression on monthly numbers of ratings and rating averages, and the probability of different ratings for a movie. The customers are clustered with similar criteria except the time progression because the monthly numbers of ratings and rating averages change from time to time, depending on the movies the customers watch in those months. After the training data have been properly clustered through various criteria, we used ensemble methods to effectively combine the advantages of various classifiers and obtain improved results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

A Preprocessing Technique to Investigate the Stability of Multi-Objective Heuristic Ensemble Classifiers

Background and Objectives: According to the random nature of heuristic algorithms, stability analysis of heuristic ensemble classifiers has particular importance. Methods: The novelty of this paper is using a statistical method consists of Plackett-Burman design, and Taguchi for the first time to specify not only important parameters, but also optimal levels for them. Minitab and Design Expert ...

متن کامل

A Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition

In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...

متن کامل

A Fault Diagnosis Method for Automaton Based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition

In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...

متن کامل

Metalearning for DynamicIntegration in Ensemble Methods

Ensemble methods have been receiving an increasing amount of attention, especially because of their successful application to high visibility problems (e.g., the NetFlix prize). An important challenge in ensemble learning (EL) is the management of the set of models to ensure a high level of accuracy, particularly with large number of models and in highly dynamic environments [49]. One approach ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006